Fuzzy Perceptive Values for MDPs with Discounting
نویسندگان
چکیده
In this paper, we formulate the fuzzy perceptive model for discounted Markov decision processes in which the perception for transition probabilities is described by fuzzy sets. The optimal expected reward, called a fuzzy perceptive value, is characterized and calculated by a new fuzzy relation. As a numerical example, a machine maintenance problem is considered.
منابع مشابه
Fuzzy optimality relation for perceptive MDPs - the average case
This paper is a sequel to Kurano et al [9], [10], in which the fuzzy perceptive models for optimal stopping or discounted Markov decision process is given. We proposed a method of computing the corresponding fuzzy perceptive values. Here, we deal with the average case for Markov decision processes with fuzzy perceptive transition matrices and characterize the optimal average expected reward, ca...
متن کاملFuzzy Optimality Equations for Perceptive MDPs
This paper is a sequel to Kurano et al [9], [10], in which the fuzzy perceptive models for optimal stopping or discounted Markov decision process are proposed and the methods of computing the corresponding fuzzy perceptive values are given. Here, we deal with the average case for Markov decisin processes with fuzzy perceptive transition matrices and characterize the optimal average expected rew...
متن کاملPerceptive Evaluation for the Optimal Discounted Reward in Markov Decision Processes
We formulate a fuzzy perceptive model for Markov decision processes with discounted payoff in which the perception for transition probabilities is described by fuzzy sets. Our aim is to evaluate the optimal expected reward, which is called a fuzzy perceptive value, based on the perceptive analysis. It is characterized and calculated by a certain fuzzy relation. A machine maintenance problem is ...
متن کاملQ learning with finite trials
The standard reinforcement learningmodel is powerful enough to deal with never ending trials. By slightly discounting rewards obtained in the future, an infinite walk in the environment is still guaranteed to have a finite expected future reward. This however comes at a price. The discounting may corrupt estimates of the expected return in ending trials. Also in most cases algorithms that can d...
متن کاملA Genetic Search In Policy Space For Solving Markov Decision Processes
Markov Decision Processes (MDPs) have been studied extensively in the context of decision making under uncertainty. This paper presents a new methodology for solving MDPs, based on genetic algorithms. In particular, the importance of discounting in the new framework is dealt with and applied to a model problem. Comparison with the policy iteration algorithm from dynamic programming reveals the ...
متن کامل